---
title: "Content Engine"
type: concept
created: 2026-04-18
updated: 2026-04-18
sources: ["raw/articles/06-reading-telemetry.md", "raw/notes/memory.md"]
tags: [adaptive-content, pipeline, caching, gpt-4o, reader-app]
---

# Content Engine

The Content Engine is the end-to-end pipeline inside the [[Adaptive Engine]] (`adapt.readingtester.com`, port 3119) that transforms any story page into a version calibrated to a specific child's reading level. It handles caching, GPT-4o calls, self-correction, and fallback.

## What the Engine Produces

For any `(book_id, page_number, child_fk_level)` tuple, the engine outputs:

1. **Leveled text** — the page rewritten at the child's FK level + 0.5 ZPD offset
2. **Vocabulary hints** — words the child is likely to tap (fed to [[Learner Bot]] vocabulary gap tracking)
3. **Narration script** — the same text formatted for TTS delivery

The goal is that the child experiences the story as written for them — not a dumbed-down version, but a level-appropriate retelling with the same narrative arc.

## Full Adapt Pipeline

```mermaid
flowchart TD
    A["Reader App :3125\npage render request\n(learner_id, book_id, page)"] --> B["Resolve child FK level\nLearner Profile API\nfk_level = child.reading_level"]
    B --> C["Compute target\ntarget = fk_level + 0.5\ncap at 5.5"]
    C --> D{"Cache hit?\nbook_id + page_no\n+ round(target,1)\n+ sha256(source_text)"}
    D -->|hit| E["Return cached text\n← fast path"]
    D -->|miss| F["Short text check\n< 30 words?"]
    F -->|yes| G["Return original\nno GPT call"]
    F -->|no| H["GPT-4o call\nLevel page to target FK\npreserve vocab hints"]
    H --> I["Score result\nFK formula on output"]
    I --> J{"|actual - target|\n> 0.5 grades?"}
    J -->|yes - first attempt| K["Retry GPT-4o\ntighter constraints"]
    K --> L["Score retry result"]
    L --> M{"Still > 0.5\noff target?"}
    M -->|yes| N["Return first attempt\nlog deviation to DB"]
    M -->|no| O["Store in cache\nReturn leveled text"]
    J -->|no| O
    N --> P["Reader App renders\n(best-effort text)"]
    O --> P
```

## Cache Strategy

The cache table `leveled_text_cache` (in `adaptive_content` DB on shared MySQL) uses a compound key:

| Key Component | Purpose |
|---|---|
| `book_id` | Identifies the source book |
| `page_number` | Specific page within book |
| `rounded_level` | `round(target_fk_grade, 1)` — prevents fragmentation |
| `source_text_hash` | `sha256(source_text)` — detects if source was updated |

When the source text changes (e.g., a book is edited in the content CMS), the hash changes and the cache entry is automatically bypassed. Old entries are not proactively deleted — they become dead entries that expire on TTL or manual purge.

**Cache effectiveness:** For a library of 9,089 books, most popular books at common FK levels (3.0, 3.5, 4.0) will quickly saturate the cache. Cold start cost is one GPT-4o call per `(book, page, level)` tuple.

## Self-Correcting Behavior

The engine runs FK scoring on GPT-4o output before caching. This is implemented via the same FK formula the bot uses to track child level:

$$FK = 0.39 \times \left(\frac{\text{words}}{\text{sentences}}\right) + 11.8 \times \left(\frac{\text{syllables}}{\text{words}}\right) - 15.59$$

If the output FK deviates by more than 0.5 grades from target, a retry is triggered with a prompt addendum that specifies:
- Maximum sentence length in words
- Maximum word length in syllables
- Forbidden vocabulary complexity markers

This self-correction is logged regardless of whether the second attempt succeeds. The deviation log feeds a future analytics dashboard for prompt quality monitoring.

## Integration with Reader App

The [[Reader App]] calls the engine on **every page render**, not in advance. This means:

- First page of a new book at a new level → cache miss → ~1–2s GPT latency
- Subsequent pages of same book → cache hit → <50ms
- User sees a loading state only on cache misses

⚠️ **Non-Negotiable (Sig):** ALL content is adaptive. The engine is not optional. Every page of every book must pass through the adapt pipeline before rendering. Static display of original text is an intermediate state only.

## Vocab Hints and Learner Bot

When the engine processes a page, it extracts vocabulary hints — words likely to be above the child's level even after leveling. These are returned in the response alongside the leveled text and passed to the [[Telemetry Service]] on word tap events.

The [[Learner Bot]] aggregates word taps across sessions and builds a `vocabulary_gaps` record per learner. On the next nightly cycle, the bot uses this to:
1. Flag recurring unknown words
2. Generate targeted vocab exercises (future)
3. Adjust content recommendations

## Provider Architecture

The engine is provider-agnostic. The adaptation call is routed through a provider adapter layer:

| Provider | Status | Use Case |
|---|---|---|
| GPT-4o | ✅ Active | Leveling + translation in one call |
| Claude | 🔧 Coded | Fallback (not in routing yet) |
| DeepL | ❌ No key | Translation-only (bypasses leveling) |
| Google Translate | ❌ No key | Translation-only (bypasses leveling) |

The provider decision (GPT for combined level+translate vs. separate translate API) was made explicitly by Jixian: GPT handles both in a single call — simpler integration, fewer API keys, one less dependency.

## Related Pages

- [[concepts/adaptive-content/FK Leveling]] — FK formula, ZPD offset, API contract for level-page
- [[concepts/adaptive-content/index|Adaptive Content Engine]] — high-level overview
- [[entities/Adaptive Engine]] — service entity page (URL, port, DB)
- [[entities/Learner Bot]] — vocabulary gap consumer
